Optimal Scheduling of Information Extraction Algorithms

نویسندگان

Henning Wachsmuth

Benno Stein

چکیده

Most research on run-time efficiency in information extraction is of empirical nature. This paper analyzes the efficiency of information extraction pipelines from a theoretical point of view in order to explain empirical findings. We argue that information extraction can, at its heart, be viewed as a relevance filtering task whose efficiency traces back to the run-times and selectivities of the employed algorithms. To better understand the intricate behavior of information extraction pipelines, we develop a sequence model for scheduling a pipeline’s algorithms. In theory, the most efficient schedule corresponds to the Viterbi path through this model and can hence be found by dynamic programming. For real-time applications, it might be too expensive to compute all run-times and selectivities beforehand. However, our model implies the benchmarks of filtering tasks and illustrates that the optimal schedule depends on the distribution of relevant information in the input texts. We give formal and experimental evidence where necessary. TITLE AND ABSTRACT IN GERMAN Optimales Scheduling von Information-Extraction-Verfahren Nahezu alle Forschung zur Laufzeiteffizienz in der Information Extraction ist empirischer Natur. Die vorliegende Arbeit analysiert die Effizienz von Information-Extraction-Pipelines aus theoretischer Sicht, um empirische Erkenntnisse zu erklären. Wir sehen Information Extraction im Kern als Relevanz-Filteraufgabe an, deren Effizienz auf die Laufzeiten und Selektivitäten der eingesetzten Algorithmen zurückgeht. Zum besseren Verständnis des komplexen Verhaltens von Information-Extraction-Pipelines entwickeln wir ein Sequenzmodell für das Scheduling der Algorithmen einer Pipeline. Theoretisch entspricht der effizienteste Schedule dem Viterbi-Pfad durch dieses Modell und lässt sich daher mittels dynamischer Programmierung finden. Für Echtzeitanwendungen kann es zu teuer sein, alle Laufzeiten und Selektivitäten im Vorhinein zu berechnen. Unser Modell impliziert jedoch die Benchmarks von Filteraufgaben und zeigt, dass der optimale Schedule von der Verteilung relevanter Informationen in den Eingabetexten abhängt. Wo nötig, führen wir sowohl formale als auch experimentelle Belege an.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimal Utilization of Cloud Resources using Adaptive Back Propagation Neural Network and Multi-Level Priority Queue Scheduling

With the innovation of cloud computing industry lots of services were provided based on different deployment criteria. Nowadays everyone tries to remain connected and demand maximum utilization of resources with minimum timeand effort. Thus, making it an important challenge in cloud computing for optimum utilization of resources. To overcome this issue, many techniques have been proposed ...

متن کامل

Appling Metaheuristic Algorithms on a Two Stage Hybrid Flowshop Scheduling Problem with Serial Batching (RESEARCH NOTE)

In this paper the problem of serial batch scheduling in a two-stage hybrid flow shop environment with minimizing Makesapn is investigated. In serial batching it is assumed that jobs in a batch are processed serially, and their completion time is defined to be equal to the finishing time of the last job in the batch. The analysis and implementation of the prohibited transference of jobs among th...

متن کامل

Improved teaching–learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having ‘g’ operations is performed on ‘g’ operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem...

متن کامل

Two meta-heuristic algorithms for parallel machines scheduling problem with past-sequence-dependent setup times and effects of deterioration and learning

This paper considers identical parallel machines scheduling problem with past-sequence-dependent setup times, deteriorating jobs and learning effects, in which the actual processing time of a job on each machine is given as a function of the processing times of the jobs already processed and its scheduled position on the corresponding machine. In addition, the setup time of a job on each machin...

متن کامل

Pre-scheduling and Scheduling of Task Graph on Homogeneous Multiprocessor Systems

Task graph scheduling is a multi-objective optimization and NP-hard problem. In this paper a new algorithm on homogeneous multiprocessors systems is proposed. Basically, scheduling algorithms are targeted to balance the two parameters of time and energy consumption. These two parameters are up to a certain limit in contrast with each other and improvement of one causes reduction in the othe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Optimal Scheduling of Information Extraction Algorithms

نویسندگان

چکیده

منابع مشابه

An Optimal Utilization of Cloud Resources using Adaptive Back Propagation Neural Network and Multi-Level Priority Queue Scheduling

Appling Metaheuristic Algorithms on a Two Stage Hybrid Flowshop Scheduling Problem with Serial Batching (RESEARCH NOTE)

Improved teaching–learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

Two meta-heuristic algorithms for parallel machines scheduling problem with past-sequence-dependent setup times and effects of deterioration and learning

Pre-scheduling and Scheduling of Task Graph on Homogeneous Multiprocessor Systems

عنوان ژورنال:

اشتراک گذاری